Integrating socio-psychological factors in the SEIR model optimized by a genetic algorithm for COVID-19 trend analysis

The global spread of COVID-19 has profoundly affected health and economies, highlighting the need for precise epidemic trend predictions for effective interventions. In this study, we used infectious disease models to simulate and predict the trajectory of COVID-19. An SEIR (susceptible, exposed, infected, removed) model was established using Wuhan data to reflect the pandemic. We then trained a genetic algorithm-based SEIR (GA-SEIR) model using data from a specific U.S. region and focused on individual susceptibility and infection dynamics. By integrating socio-psychological factors, we achieved a significant enhancement to the GA-SEIR model, leading to the development of an optimized version. This refined GA-SEIR model significantly improved our ability to simulate the spread and control of the epidemic and to effectively track trends. Remarkably, it successfully predicted the resurgence of COVID-19 in mainland China in April 2023, demonstrating its robustness and reliability. The refined GA-SEIR model provides crucial insights for public health authorities, enabling them to design and implement proactive strategies for outbreak containment and mitigation. Its substantial contributions to epidemic modelling and public health planning are invaluable, particularly in managing and controlling respiratory infectious diseases such as COVID-19.


SEIR model
Data preparation and theoretical basis of the standard SEIR model First, we set the time series to span 150 days, with daily COVID-19 data collected in Wuhan City, China.Each day, a health status assessment is conducted for all individuals considering their state from the previous day 9 .The standard SEIR model divides health status into four defined population groups: S (susceptible), I (infective), E (exposed) and R (removed) 10 .The model determined that the population was evenly mixed, without special isolation or the implementation of relevant policies 11 .The transfer process of COVID-19 transmission and the parameters used in this study are shown in Fig. 1.S(t) denotes healthy individuals who have not been infected with the virus and lack immunity, E(t) denotes individuals who are in the incubation period of infection after effective contact with the infected patient, I(t) denotes individuals who are symptomatic and infectious, and R(t) denotes individuals who have either recovered from or passed away due to the disease 12 .
S(t), E(t), I(t), R(t) represents the total number of individuals in each group at time t, respectively.In addi- tion, the total population of the city is kept constant as N , and special circumstances such as population flow are not considered, which is denoted as S(t) + E(t) + I(t) + R(t) = N 13 .
For group interactions, we set the effective contact rate between susceptible persons and infected patients as α .The probability of the exposed group becoming infected is represented by β , which is the reciprocal of the incubation period days.The cure rate, δ , indicates the probability of removed of the infected group, assuming no reinfection of pathogens after recovery 14 .The mortality rate affecting the transition of individuals from the infectious group I(t) to the recovered group R(t) due to COVID-19 is represented by µ 1 , which specifically describes the rate at which symptomatic and infectious individuals die from the disease.Finally,ν is the vaccination rate of the susceptible group, with vaccinated individuals transitioning from E(t) to R(t), assuming natural mortality affects all compartments, represented by µ , impacting the general mortality rate not directly related to the disease 15,16 .
The details within Fig. 1 are expressed by specific differential equations as follows 17 :

Determine the transmission equilibrium point
On the initial day (day zero) of transmission, the system is considered to be in equilibrium, denoting that the rate of change of the number of each group (S, E, I, R) over time is zero; in other words, their numbers remain constant.To find the equilibrium point, we set the left side of the differential equations of the model to zero, which yields four Eqs. 18.
At this point, we normalize the population groups by the total population (N) using the formula and we obtain Eq. ( 2): Since both day zero and the day of epidemic elimination are at the disease-free equilibrium point 18 , the E and I values are zero at these times.These values are then substituted into Eq.( 2) for calculation, and we obtain Eq. ( 3): This leads us to Eq. ( 4): At the virus equilibrium point of the COVID-19 epidemic, the transmission stabilized and reached a dynamic equilibrium state, where the population sizes of each group remained constant.This implies that the change rate of the number of individuals over time is zero 7 , as shown in Eq. ( 5): Therefore, the equilibrium point K 0 of COVID-19 is represented by the values (S e , E e , I e , R e ).

Assessing the stability of the equilibrium point
To understand the spread of an epidemic and predict its further development with greater accuracy 20 , we need to assess the stability of the equilibrium point in both situations.

Disease-free equilibrium
For the disease-free equilibrium point K 0 observed on day zero of the outbreak, we bring it into Jacobian to obtain J(K 0 ) 21 to determine its stability.The stability is confirmed by the eigenvalues derived from the characteristic equation under the conditions S = µ µ+ν , I = 0 in the model, yielding the characteristic Eq. 22 .The results for the four eigenvalues are expressed in Eq. ( 6): When 1 < 0, 2 < 0, 3 < 0 and 4 < 0 , this equilibrium is stable, suggesting a lack of disease spread under current conditions.

COVID-19's equilibrium point
Following a similar methodology as above, the equilibrium point during the epidemic spread is brought in, and the eigenvalues of its matrix are calculated 24 .
After a series of substitutions, we obtain Eq. ( 7): Four eigenvalues are obtained as shown in Eq. ( 8): The conditions for the stability of the COVID-19 equilibrium are summarized in Eq. ( 9): According to the equations of the SEIR model, we compute R 0 as detailed in Eq. ( 10): Consequently, the result is shown in Eq. ( 11): Additionally, S = µ µ−υ is added to obtain the basic reproduction number in Eq. ( 12): Finally, the stability of the equilibrium point is further evaluated based on R 0 , determining whether it is globally stable, locally stable, or unstable.

Incorporation of socio-psychological factors
To integrate socio-psychological factors into the GA-SEIR model, we modified the transmission rate ( α ) and removed rate ( δ ) based on quantified indices of public behavior.These adjustments are defined as follows: we defined indices for compliance (C) and sentiment ( S 1 ), ranging from 0 to 1, where 0 indicates no compliance or negative sentiment, and 1 indicates full compliance or positive sentiment.These indices were derived from aggregated data collected from biweekly social surveys and behavioral studies.
To ensure a smooth transition between the theoretical framework and practical application, dynamic adjustments are made to the model parameters.The transmission rate ( α ) is adjusted by a factor dependent on the compliance index C as follows: where k α is a scaling factor that determines how significantly noncompliance affects the transmission rate.Similarly, the removed rate ( δ ) is modified based on the sentiment index S 1 by: where k δ is a scaling factor that amplifies the removed rate in response to positive public sentiment.
The socio-psychological indices C and S 1 are updated biweekly, capturing the latest survey data.These updates are then applied to the model to dynamically adjust the transmission and removed rates, ensuring that the model parameters reflect current public behaviors and attitudes.
By incorporating these adjustments, the differential equations for the SEIR model are updated as follows: By applying these formulas, the model dynamically adapts to the impact of socio-psychological factors on the transmission dynamics of COVID-19, offering a nuanced understanding of how public behavior influences epidemic trends.

Optimization of the SEIR model by the GA
In addition to population group sizes, the SEIR model requires an understanding of the COVID-19 infection rate, the conversion rate at which exposed individuals become infectious, and the removed rate for infected individuals to achieve full recovery 25 .The infection rate, which represents the daily probability of a susceptible individual contracting the virus, indicates the virus's transmission speed.The daily conversion rate, the likelihood that an exposed individual becomes infectious, affects the initial spread and control measures.The removed rate, denoting the daily probability of an infected individual's full recovery, varies with different medical conditions 26 .
To determine these rates accurately, we compare the predicted data using the GA-SEIR model with the actual data using the standard SEIR model.We propose using a GA to optimize the three probability values, aiming to minimize the error between the predicted and actual daily infection numbers 27,28 .

Establishment of the GA
The GA is a parallel random search optimization method that simulates the genetic mechanism of nature and biological evolution.It operates by selecting individuals according to the fitness function, employing genetic processes such as selection, crossover and mutation.This process ensures that individuals with higher fitness values are retained, while those with lower fitness values are eliminated.As a result, each new group not only inherits the characteristics from the previous generation but also outperforms the previous generation.GA has been widely used due to its characteristics of efficient heuristic search and parallel computing.The GA operates in five main steps 29 : Step 1-Encoding: Prior to the search process, the GA encodes the solution data from the solution space into a genotype string structure within the genetic space.Different combinations of these string structure data represent different points.
Step 2-Initial population: Initial string structure data were randomly generated, each termed an 'individual' , to form a group.The GA starts its evolution at the initial point.
Step 3-Selection: The fitness function, which varies with each problem, evaluates the individuals or solutions.Selection follows Darwin's survival of the fittest principle, giving better-adapted individuals higher reproductive chances.
Step 4-Crossover: This crucial step combines parental traits to create new individuals, fostering information exchange.
Step 5-Mutation: With low probability, the GA randomly alters a string in an individual, mirroring biological mutation.

Optimization of SEIR by GA
The limitation of the SEIR model lies in its initial inability to ascertain virus-related probabilities.Typically, these probabilities are derived from extensive data analysis, which is slow and prone to local optimum errors 30 .Currently, the advantages of fast training speed and strong global optimization ability of GA are shown.By integrating the GA's global optimization with the prediction ability of the SEIR model for virus transmission, we can obtain a new algorithm that has both good prediction accuracy and rapid convergence ability.
First, the four population groups are coded in the SEIR model to generate the initial population for the GA.After calculating the fitness, the basic GA operations proceed.The process iterates until the minimum error condition is met 31 .If it is satisfied, the three probabilities calculated by the GA are returned to the SEIR model for the next prediction.If it is not satisfactory, the first step of the GA is returned to recalculate the fitness for operation.The entire process is depicted in Fig. 2.

Software and tools used
For data processing and visualization, MATLAB (Data analysis, 2021b, UK) was utilized.This included importing the pandemic data and transforming it into a visual trend chart.

Ethical approval
The data in this study are publicly available without any personal identifying information.

Dynamic population trends in the standard SEIR model during the COVID-19 pandemic
We established a standard SEIR model to simulate the dynamic population trends during the COVID-19 pandemic utilizing data from Wuhan, China.Figure 3 illustrates the dynamic trajectory of each population group over a 100-day period.The figure reveals a decrease in the number of susceptible individuals, an initial increase followed by a decrease in the number of exposed individuals, a sharp increase and decrease in the number of infected individuals, indicating that the outbreak's peak, and a steady increase in the removed population, which includes those who have recovered from COVID-19 or are no longer part of the transmission chain.

SEIR fitness optimization by the GA
Using GA, we optimized three key probabilities in the SEIR model the infection rate, conversion rate, and removed rate for a population of 200,000.Figure 4 shows the optimization over 20 generations, revealing a stabilized maximum fitness, indicating that an optimal parameter set was found.The gradual increase in the average fitness demonstrates the effectiveness of the GA.From the seventeenth generation, we pinpointed these rates (detailed in Table 1), which serve as baseline probabilities for the subsequent GA-SEIR model that omits and then incorporates socio-psychological factors.

Comparing GA-SEIR and SEIR predictions without socio-psychological factors
Using the GA-SEIR model with the three rates from Table 1, we forecasted the progression of COVID-19 among different population groups in Wuhan.The probabilities, derived from GA iterations, were integrated as known variables into the SEIR model.This yielded trends of epidemic progression for each group, as depicted in Fig. 5.This figure illustrates the number of susceptible, exposed, infected, and removed individuals over time,

Discussion
This study applied the SEIR model and its enhanced variant, the GA-SEIR model, both omitting and including socio-psychological factors, to simulate and predict the trajectory of COVID-19.This analysis focused on the dynamics of susceptible and infected individuals using daily COVID-19 data from diverse locations 5 .We analysed both the SEIR and GA-SEIR models in detail, and importantly, we improved the GA-SEIR model by adding socio-psychological factors 32 .Our comparative study highlights the unique benefits and drawbacks of each model in the context of COVID-19.Incorporating these socio-psychological factors into the GA-SEIR model represents a major advancement, enhancing the realism of our predictions about how the COVID-19 epidemic spreads and is controlled.This improvement is aligned with our goal of developing models that not only accurately predict epidemic trends but also consider the impact of human behavior influenced by socio-psychological factors, which is especially relevant in the context of COVID-19.Crucially, this enhanced predictive model has  significant implications for future management and control strategies for respiratory infectious diseases such as COVID-19, providing vital insights for public health authorities to proactively implement effective containment and mitigation measures in response to potential outbreaks 32 .
While the standard SEIR model is a key tool in epidemic modelling, it often lacks the detailed complexity needed to capture the dynamics of real-world diseases such as COVID-19.Qiu et al. 5 made significant contributions in this regard by enhancing the SEIR model with GA optimization and further incorporating factors such as the incubation period and the isolated population into the GA-SEIR model.These improvements significantly increased the model's accuracy in predicting epidemic trends, particularly in forecasting the timing and intensity of COVID-19 infection peaks.
Our study represents a novel advancement in this field by being the first to integrate socio-psychological factors into the enhanced GA-SEIR model for analysing the COVID-19 epidemic.This innovative approach significantly advances epidemic modelling.Our enhanced model accurately depicts the interaction between human behavior and COVID-19 dynamics.Its ability to track the actual curves of the COVID-19 epidemic and identify potential outbreak hotpots highlights the critical role of human behavior in the trajectory of this disease, which is often overlooked in conventional SEIR modelling 10,14 .Notably, this study successfully predicted the resurgence of COVID-19 in mainland China in April 2023.This success underlines the efficacy of our methodological approach in incorporating diverse datasets, which ensures a robust and comprehensive analysis, thereby enhancing the strengths and applicability of our developed models in understanding and predicting COVID-19 dynamics.
Despite this study representing a significant step forward by incorporating socio-psychological factors into the enhanced GA-SEIR model for analysing the COVID-19 epidemic, it is important to acknowledge its limitations.The current model, specifically developed for COVID-19, may face challenges when applied to other pathogens due to the unique transmission patterns and societal impacts of different diseases.Future adaptations of this model will need to adjust parameters to accurately fit specific disease characteristics and public health responses.In enhancing our model, we aim to integrate a broader array of data sources, including direct surveys and healthcare reports, to improve geographical and demographic representativeness.Additionally, including details such as the disease's incubation period, along with specific health policies such as mask mandates, quarantine regulations, and vaccination efforts, could offer a more detailed view of how epidemics evolve across different areas 33 .Future studies will focus on making the model more realistic by considering a wider array of factors or by employing a blend of different disciplines and AI techniques for real-time data analysis and adjustment.These advancements are expected to yield more precise forecasts, aiding in the preparation and response to a wide range of infectious disease outbreaks beyond COVID-19.

Conclusion
In our study, we compared the SEIR and GA-SEIR models using US COVID-19 data, focusing on model performance.While the SEIR model struggled with limited adaptability to behavioral and policy changes, the GA-SEIR model showed improved accuracy, especially in predicting COVID-19 peaks.Our key contribution was integrating socio-psychological factors into the GA-SEIR model, enhancing its ability to capture the interplay between human behavior and COVID-19 dynamics.This model notably predicted the resurgence of COVID-19 in mainland China in April 2023, demonstrating its robustness.The refined GA-SEIR model offers vital insights

Figure 1 .
Figure 1.Standard SEIR model for COVID-19 transmission: illustrating population dynamics in susceptible (S), exposed (E), infectious (I), and removed (R) groups and periodic parameter comparisons during the epidemic.Note: S represents healthy individuals who have not been infected with the virus and lack immunity; E denotes individuals who are in the incubation period of infection after effective contact with the infected patient; I indicates individuals who are symptomatic and infectious; R includes individuals who have either recovered from or passed away due to the disease; N denotes the total population of the city is kept constant; α denotes the effective contact rate between susceptible persons and infected patients; β indicates the probability of the exposed group becoming infected; µ represents the natural mortality rate affecting all population groups, reflecting non-disease-specific deaths; µ 1 specifically denotes the mortality rate due to COVID-19, affecting the transition from I to R; ν indicates the vaccination rate of the susceptible group; and δ represents the cure rate.
https://doi.org/10.1038/s41598-024-66968-0highlighting an inflection point at approximately day 35 postoutbreak, which indicates a significant decrease in infection numbers.Subsequently, we compared the infection data predicted by the GA-SEIR model and the actual observed data from the standard SEIR model.The comparison, depicted in Fig. 6, highlights the predictive accuracy of the GA-SEIR model.The figure demonstrates the enhanced forecast accuracy of the GA-SEIR model compared to that of the standard SEIR model.Notably, the GA-SEIR model demonstrated robust precision, especially for the critical peak period between days 60 and 80 on the x-axis.Here, the predictions closely align with the actual data, demonstrating the effectiveness of the genetic algorithm in capturing the most crucial phase of the epidemic's trajectory.External validation of the improved GA-SEIR model with socio-psychological factors for China's data.The refined GA-SEIR model with socio-psychological factors was applied to forecast COVID-19 epidemic trends in mainland China, and the results are shown in Fig. 7. Importantly, after the completion of our study, the model successfully predicted the resurgence of COVID-19 in mainland China in April 2023.The inclusion of socio-psychological factors in the model resulted in a predictive curve that more closely aligned with the actual epidemic data, especially during critical peaks.This enhancement significantly extends the original model's capabilities, considering behaviors influenced by socio-psychological factors, such as public response

Figure 2 .Figure 3 .
Figure 2. Flow chart of the SEIR (susceptible, exposed, infected, and removed) model based on a genetic algorithm (GA): an iterative process is added to the basic flow of the SEIR model, and the goodness of fit calculation of four populations is iterated until the limit parameter closest to the true value is generated.

Figure 4 .
Figure 4. Fitness evolution of the GA-SEIR model for daily COVID-19 data from a specific region in the United States (June 2021-December 2022).Note: SEIR-Susceptible, exposed, infected, and removed; GA-Genetic algorithm.

Figure 5 .
Figure 5. Trends of the GA-SEIR model for different groups of daily COVID-19 data from a specific region in the United States (June 2021-December 2022).Note: SEIR-Susceptible, exposed, infected, and removed; GA-Genetic algorithm.

Figure 6 .
Figure 6.Comparison of the SEIR model and GA-SEIR model predictions for daily COVID-19 data from a specific region in the United States (June 2021-December 2022).Note: SEIR-Susceptible, exposed, infected, and removed; GA-Genetic algorithm.The x-axis signifies either the timeline or various stages of disease progression.The 'Predict' line is shown as a solid blue line, indicating the model's predictions, while the 'True' line, depicted as a black dashed line, represents the actual observed data using the standard SEIR model.

Table 1 .
GAto health policies and adherence to guidelines.Consequently, this refinement in the model enhances its ability to pinpoint potential outbreak points, shedding light on the dynamics of virus transmission influenced by social and psychological behavior.